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Abstract 

Inferring on others' (potentially time-varying) intentions is a fundamental problem during many social transactions. To 
investigate the underlying mechanisms, we applied computational modeling to behavioral data from an economic game in 
which 16 pairs of volunteers (randomly assigned to "player" or "adviser" roles) interacted. The player performed a 
probabilistic reinforcement learning task, receiving information about a binary lottery from a visual pie chart. The adviser, 
who received more predictive information, issued an additional recommendation. Critically, the game was structured such 
that the adviser's incentives to provide helpful or misleading information varied in time. Using a meta-Bayesian modeling 
framework, we found that the players' behavior was best explained by the deployment of hierarchical learning: they inferred 
upon the volatility of the advisers' intentions in order to optimize their predictions about the validity of their advice. Beyond 
learning, volatility estimates also affected the trial-by-trial variability of decisions: participants were more likely to rely on 
their estimates of advice accuracy for making choices when they believed that the adviser's intentions were presently stable. 
Finally, our model of the players' inference predicted the players' interpersonal reactivity index (IRI) scores, explicit ratings of 
the advisers' helpfulness and the advisers' self-reports on their chosen strategy. Overall, our results suggest that humans (i) 
employ hierarchical generative models to infer on the changing intentions of others, (ii) use volatility estimates to inform 
decision-making in social interactions, and (iii) integrate estimates of advice accuracy with non-social sources of information. 
The Bayesian framework presented here can quantify individual differences in these mechanisms from simple behavioral 
readouts and may prove useful in future clinical studies of maladaptive social cognition. 
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introduction 

The process of how we represent others' intentions is an 
important determinant of social exchange. This inferential process 
becomes even more crucial when we need to rely on other people's 
advice regarding a course of action. Credibility can be inferred 
from another's reputation, which is in turn developed through 
recursive social interactions [1,2]. But since advice is motivated by 
unknown goals, which may also change in time, we are constantly 
challenged by the question of how accurately we represent others' 
intentions. 

As agents' intentions are hidden from observers, they have to be 
inferred from their actions. The monitoring of otlier agents' 
intentions represents a particular aspect of "theory of mind" [3-5]. 
Different cognitive frameworks for understanding this process have 
been suggested, e.g. action understanding vs. mentahzing (attri- 
bution of mental states) [6-8]. Bayesian models in particular 
provide a formal account of how observers build models of other 
agents and use them to predict their desires or intentions. One 



important approach is to formulate social cognition in terms of a 
partially observable Markov decision process (POMDP) that 
describes the relations between environmental states (accessible 
to the observer) and another agent's (unobservable) mental states 
[9-11]. This conceptualization, however, tends to be normative 
and does not usually emphasize individual variability in social 
inference. Another framework proposes that theory of mind can be 
understood in terms of recursive thinking, and focuses on 
identifying the depth of reasoning that leads to optimal inference 
[2,12,13]. Importandy, so far both types of approaches have been 
applied to situations where the other agents' intentions are stable 
over time. 

In the present study we build on these previous computational 
treatments of how humans infer on the intentions of others by 
considering the additional challenge of detecting how quickly they 
change in time, i.e. volatility. To this end, we propose novel 
generative models of how humans may infer on volatile intentions 
of others and apply these models to behavioral data from a new 
experimental paradigm. The models we employ are conceptually 
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Author Summary 

The ability to decode another person's intentions is a 
critical component of social interactions. This Is particularly 
important when we have to make decisions based on 
someone else's advice. Our research proposes that this 
complex cognitive skill (social learning) can be translated 
into a mathematical model, which prescribes a mechanism 
for mentally simulating another person's intentions. This 
study demonstrates that this process can be parsimoni- 
ously described as the deployment of hierarchical learning. 
In other words, participants learn about two quantities: the 
intentions of the person they interact with and the veracity 
of the recommendations they offer. As participants 
become more and more confident about their represen- 
tation of the other's Intentions, they make decisions more 
in accordance with the advice they receive. Importantly, 
our modeling framework captures individual differences in 
the social learning process: The estimated "learning 
fingerprint" can predict other aspects of participants' 
behavior, such as their perspective-taking abilities and 
their explicit ratings of the adviser's level of trustworthi- 
ness. The present modeling approach can be further 
applied in the context of psychiatry to identify maladap- 
tive learning processes in disorders where social learning 
processes are particularly Impaired, such as schizophrenia. 



similar to previous POMDP models, but emphasize individual 
approximations to Bayes-optimality, as described below. 

Specifically, we addressed the following two questions by 
comparing the explanatory power of alternative computational 
models that were fitted to the observed behavior: (i) Are humans 
able to deploy hierarchically structured learning during social 
interactions and simultaneously predict the accuracy of advice and 
the stability of the adviser's intentions? (ii) Would humans rely 
more on social advice (with potentially high information but also 
unknown degree of uncertainty) or on non-social information that 
is potentially less accurate but had a known outcome distribution 
(i.e., risk)? 

To address these questions, we designed an interactive and 
deception-free economic game that involved situations of both 
aligned and conflicting interests between participants (aU male) 
who were randomly assigned to a "player" or an "adviser" role. In 
this social exchange paradigm, which builds on a previous task by 
Behrens et al. (2008), participants received distinct information 
about the probability of two possible outcomes. The player had to 
predict the outcome of a binary lottery whose true probability 
distribution was displayed as a pie chart. The adviser issued an 
additional recommendation (advice) to the player. The informa- 
tion available to the adviser was still probabilistic, but with a larger 
and constant probability (80%) as it was generated after the 
outcome had been drawn (Figure 1). 

Importandy, the adviser's payment was structured such that his 
incentive to provide valid or misleading advice varied during the 
game and introduced temporal variations in aligned and 
conflicting interests between player and adviser. This required 
the player to detect changes in the adviser's intentions and adapt 
his own decision-making accordingly. It is not clear, however, 
what exact mechanism underlies adaptive behavior in this 
scenario: would players only track trial-wise changes in advice 
accuracy, or would they invoke a more complex hierarchical 
model, which also assumes that players track the volatility of the 
advisers' intentions (see [14])? Furthermore, even if the latter was 
the case, would volatility estimates only serve to optimize inference 



and learning, or would they directly impact on trial-by-trial 
variability of decisions? 

To address these questions, we considered different explana- 
tions (hypotheses) for the behavior displayed by our participants, 
each of which was formalized as a two-component model. The first 
component of each model represented the player's beUef updating 
about the causes of the advisor's behavior; we refer to this 
component as the "perceptual model". The second component is 
the "response model", which maps the current belief to the 
player's actual decision (see [15,16]). We constructed a factoriaUy- 
structured set of 12 different models (model space) by systemat- 
ically combining different perceptual and response models (see 
Figure 2), as described in detail in the Methods section. We then 
fitted these models to the trial-by-trial responses of each subject 
using Bayesian model inversion and formally compared the 
plausibility of all 12 models by random effects Bayesian model 
selection (BMS). Altogether, this corresponds to a "meta-Bayes- 
ian" approach [15], i.e., a Bayesian treatment of Bayesian models 
of cognition, also known as a "doubly Bayesian" [17] or 
"ecumenical Bayes" [18] approach. This enabled us to identify 
a hierarchical generative model, which may underlie social 
inference in our paradigm, and whose parameter estimates 
predicted independent behavioral data, such as explicit ratings 
of the players, self-reports on strategy used by the advisers and 
questionnaire scores. 

Materials and Methods 

Ethics statement 

AU participants gave written informed consent before the study, 
which had received ethics approval by the local responsible 
authorities (Kantonale Ethikkommission, KEK 2010-0312/3). 

Participants 

Thirty-two healthy male adult volunteers (age range: 19-30 
years; median age = 22) participated in the study. Only men 
participated in this study to avoid potential gender-related 
confounds in the pairings of advisers and players, such as gender 
differences in the perception of trustworthiness (with women being 
perceived as generally more trustworthy than men [19]). 

Participants with previous neurological or psychiatric history or 
who were taking medication at the time were excluded from the 
study. Three days before the testing session, participants received a 
battery of psychological questionnaires, which they had to fiU out 
online. This included the Temperament and Character Inventory 
(TCI-K) [20] to measure personality traits and the Interpersonal 
Reactivity Index (IRI) [21] to measure empathy, perspective- 
taking, and theory of mind traits. 

Experimental procedure 

Inspired by the paradigm of Behrens et al. (2008), we developed 
a deception-free and interactive economic game for monetary 
rewards. This paradigm involved pairs of volunteers (randomly 
assigned to a "player" and "adviser" role) who met each other for 
the first time on the day of the experiment. The player had to 
perform a standard probabilistic reinforcement learning task and 
was provided with truthful information about the a priori 
probabilities of trial-wise outcomes by a visual pie chart. The 
outcome was either green or blue, and aU trials contained one of 6 
cue types (blue:green pie charts: 75:25, 65:35, 55:45, 45:55, 35:65, 
and 25:75) (Figure la). The adviser, however, received more 
accurate information: once the outcome was determined (accord- 
ing to the probabilities of the visual pie chart), he was informed 
about the result with a constant accuracy of 80%. Based on this 
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Figure 1. Experimental paradigm. Sixteen pairs of healthy male volunteers randomly assigned to the "adviser" role (A) or the "player" role (B) 
interacted in an economic game. The player had to predict the outcome of a binary lottery for which the odds were shown as a pie chart (cue). The 
player saw a progress bar, which increased with every correct prediction (and decreased with every incorrect/missed prediction). If the player reached 
the silver range, he received an extra bonus of CHF 10 (Swiss Francs); if he reached gold, he received an extra CHF 20. The adviser, however, received 
more information about the outcome (constant probability of 80%), and based on this information, advised the player on which option to choose. 
Critically, the adviser's motivation to provide valid or misleading information varied across the game. In addition to the player's progress bar, the 
adviser was shown his own gold and silver ranges (which the player did not see). If the player's score landed within the adviser's silver range at the 
end of the game, the adviser received an extra CHF 10; if the player's score landed in the adviser's golden range, the adviser earned an extra CHF 20. 
Importantly, before the experiment the player was informed (truthfully) that the adviser had his own undisclosed incentives and that his intentions 
might change during the game. 
doi:10.1371/journal.pcbi.1003810.g001 



information, the adviser issued a recommendation to the player on 
which option to choose. To signal his suggestion, the adviser held 
up a blue or a green card (Figure lb); these recommendations 
were recorded, using a video camera, for use as stimuli in future 
experiments. Throughout the experiment, both the player and the 
adviser sat across from each other and were not allowed to interact 



in any other way than the adviser holding up a card to indicate his 
suggestion. 

Notably, as detailed below, the adviser's pay-off was structured 
such that his motivation to provide valid or misleading information 
varied across the game. The player therefore needed to learn 
about the time-varying intentions of the adviser in order to decide 
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Figure 2. Hierarchical structure of the model space: Perceptual models, response models, specific models. The models considered in 
this study have a 3 x2 x2 factorial structure and can be displayed as a tree. The leaves at the bottom represent individual models of social learning in 
which both social and non-social sources of information are considered. The nodes at the first level represent the perceptual model families (three- 
level HGF, reduced two-level HGF, and RW). Two response models were formalized under the HGF model: decision noise in the mapping of beliefs to 
decisions either (1 ) depended dynamically on the estimated volatility of the adviser's intentions ("Volatility" model) or (2) was a fixed entity over trials 
("Decision noise" model). At the third level, the response model parameters can be divided further according to the weight of social versus non-social 
information - these models propose that participants' beliefs are based on (1) both cue and advice information and (2) advice only. The branch on 
the left-hand side proposes a model in which only the given cue probabilities (i.e., the pie chart) enter the response model (Cue Probability). 
doi:1 0.1 371/journal.pcbi.1 00381 0.g002 
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whether to trust him or not on any given trial. In addition to 
computational modeling of trial-wise choices, we obtained an 
explicit readout of the player's estimates by requiring him, on 8 
out of the total of 200 trials, to characterize the advisers' intentions 
as "helpful", "misleading", or "uninformative". The timing of 
these questions and the order of the options was randomized, but 
they were presented at the same times across subjects. 

The player's final payment was proportional to his total score, 
plus a potential bonus if his score ended in a predefined silver or 
gold range (see Figure 1). He could track the accuracy of his 
predictions by monitoring a progress bar at the bottom of the 
screen, which increased with every correct prediction and 
decreased with every missed or incorrect response by 1 point. 
By reaching the silver or the gold target, he could win CHF 1 0 or 
CHF 20, respectively (Figure 1 and Table 1). The player was 
informed before the experiment that the adviser had incentives 
that were not necessarily aligned with his own and could vary 
throughout the experiment. 

The adviser was able to monitor the player's progress, and was 
simultaneously shown his own opportunities to gain monetary 
rewards (i.e., gold and silver ranges, which were unknown to the 
player). Critically, the targets of the player and the adviser were 
arranged to create situations of shared and conflicting interests: the 
gold range of the adviser preceded the silver target of the player, 
and the silver range of the adviser also ended before the onset of 
the gold target of the player (see Figure 1 and Table 1). 

A typical interaction between the two participants during this 
game unfolded in the following manner (compare Figure 1): the 
adviser initially had an incentive to assist the player until the latter 
reached the adviser's gold range. Once the players' score was 
within the adviser's gold range, the advisers' incentive to provide 
misleading advice increased. Once the player recognized this 
hidden change in intention and either ignored the advice or 
decided to bet on the opposite color, the player's progress bar was 
likely to exceed the adviser's gold range. Consequentiy, if the 
adviser was unable to confine the player to his (the adviser's) gold 
range, the next-best strategy for the adviser was to help the player 
with correct advice again and aim to push him into his (the 
adviser's) silver range. Once the player reached the adviser's silver 
range, the adviser had an additional incentive to mislead the 
player again to prevent the player from moving out of his (the 
adviser's) silver range. 

To distinguish general inference processes under volatility from 
inference specific to intentionality, each pair of participants also 
performed a control task. To exclude temporal order effects, the 
sequence of the two tasks was counterbalanced across participants. 
In the control task, the adviser was blindfolded and issued his 
recommendation by picking a card from 6 separate decks placed 
before him by the experimenter. The blindfolding removed any 
intentionality by preventing that the adviser could influence what 
advice he was giving the player; furthermore, the adviser was 
unable to witness trial outcomes. The predictive accuracy of the six 
decks of cards was either 80% or 20%. The players were informed 

Table 1. Player and adviser incentives. 



in advance that the card decks varied in their predictive accuracy, 
but not what the probabilities were nor that they were constant per 
each deck. However, the players could observe from which deck 
the card was sampled. This control condition thus closely 
corresponded to the main task, except for the role of intentionality: 
the player was required to track advice accuracy under volatility 
(induced by the adviser blindly switching between decks with 
different accuracy) and had to make trial-wise decisions how to 
combine the veridical information from the visual pie chart with 
the more informative (but volatile) advice. 

Both tasks included 192 trials (plus the 8 rating trials) with an 
equal number of 6 cue target types (75:25, 65:35, 55:45, 45:55, 
35:65, and 25:75 blue: green pie charts). The trial outcome was 
randomly drawn from these probability distributions. At the end of 
the study, all participants were debriefed and asked to describe the 
strategy that they employed during the game. 

Computational modeling 

In the present study, we examined how subjects updated their 
beliefs about others' intentions and chose to follow or disregard 
their advice. For this purpose, we applied two cognitive models 
(which we here refer to as "perceptual models"): (i) the 
Hierarchical Gaussian Filter (HGF), a generic Bayesian model of 
learning under perceptual uncertainty and environmental volatil- 
ity [22], and (ii) the Rescorla-Wagner (RW) model [23], a 
commonly used reinforcement learning model. In order to verify 
whether players really deploy hierarchical learning and infer on 
the volatility of the adviser's intentions, we also included a reduced 
(non-hierarchical) version of the HGF as control; this alternative 
model contained only two levels of learning (see Table 2). 

Furthermore, in order to link trial-by-trial beliefs to the 
observed decisions (and thus enable model inversion), we 
considered several alternative response models, which differed 
with regard to whether participants incorporated social and/or 
non-social sources of information. Together, this resulted in a 
factorial model space (see Figure 2), which is described in more 
detail below. 

Perceptual models 

Hierarchical Gaussian Filter (HGF) . The HGF is a generic 
hierarchical model of learning, which allows for inference on an 
agent's beliefs about the state of the world from his/her observed 
behavior (see [22] for theoretical background and [24] and [25] 
for recent applications). This model is related to the "Bayesian 
brain" hypothesis [26-30], which postulates that evolutionary 
selection should have resulted in neural and cognitive processing 
principles that approximate a statistical optimum. This implies 
that the brain maintains and continually updates a generative 
(predictive) model of its sensory inputs, which allows for inference 
on hidden environmental states that are hierarchically organized 
and cause the sensory inputs that the agent experiences. In the 
HGF, these states evolve in time as hierarchically coupled 



Player Silver Target Gold Target 
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Adviser Gold Range Silver Range 
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Table 2. Prior mean and variance of the perceptual and response model parameters. 





Parameter 


Prior mean 


Prior variance 


(i) HGF model class M, . . . 
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Note: The prior variances are given in the space in which parameters are estimated, /c, i9, y.,1.1^2 ^'^^'3* '''.r^*"'^^ and C are estimated in logit-space, while (T2, 03 and are 
estimated in log-space. 
doi:l 0.1 371 /journal.pcbi.l 00381 0.t002 



Gaussian random walks where, at any given level, the variance 
(step size) is controlled by the level above (see [22] for details). 

In brief, the HGF proposes that an agent uses a sequence of 
sensory inputs to make inferences on a hierarchy of hidden states 
x'j*' ,^2*^' , . . . ,x'n^ of its environment (where k is a trial index and n 
is the number of levels in the hierarchy); see Figure 3. In the 
context of our paradigm, the player has to infer on the congruence 
between the advice and the outcome. Thus, the hidden state x\ 
denotes the accuracy of the advice, which is binary, i.e., any single 
piece of advice is either accurate (x\^ = 1) or inaccurate (x\"' =0)- 

Beliefs about advice accuracy and advisers' intentionality are 
represented as time-varying states in the model, where all states 
higher than X\ are continuous and evolve as Gaussian random 
walks, which are hierarchically coupled to each other in the 
following manner: The lowest state, X\ represents the participant's 
belief about advice accuracy, i.e., the probability that the advice is 
accurate (Eq. (1)), and depends on the next higher (unbounded) 
state X2 via the logistic sigmoid transformation s(-) in Eq. (2). 



p{xi\x2)=s{x2Y^{\-s{x2))'-'^=-QtrnoxA\i{xv,s{x2)) (1) 
where 



s(xy 



1 



1 + exp( — x) 



(2) 



At the next higher level, X2 denotes the belief about the adviser's 
tendency to deliver accurate advice (i.e., the adviser's current 
degree of helpfulness). The variance or the step size with which 
evolves over time depends on the level above, the highest state X3. 



/ {k). (k-l) (k) 
p[X2 5X3 ,K,(U 



-M[xI 



(k). Jk 1 ) ^ / ^^^{k) _|_ ^ 



:x\ 



(3) 
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The highest state X3 represents the (log) volatility of the adviser's 
intentions (i.e., tendency to offer accurate advice). 

p{xfVt'\9)=u{xf;xf-'\9) (4) 



and contain only a few parameters (see [22] for details). As 
mentioned above, the form of these update equations is similar to 
those of RW learning, providing a Bayesian perspective on 
reinforcement learning theory [32]. Under our scheme, the 
general structure of these belief updates can be summarized as: 



The temporal evolution of these states and their influence on 
each other is captured by three parameters, which can differ across 
agents to allow for individual belief-updating styles: (i) K 
determines the degree to which the second level X2 is coupled to 
the third level X3, (ii) ft) represents the constant (tonic) component 
of the log-volatility at the second level, which is independent of the 
variable (phasic) component X3, and (iii) .9 determines how quickly 
X3 evolves over time (i.e., the step size of the Gaussian random 
walk performed by X3). This results in the following generative 
model (for a graphical model of our implementation of the HGF, 
see Figure 3): 

p(xf\x['\xf\xt'\4-\,o,,9) = 
p(^xf'\xf'''jp{x2^\xf ^\xf\K,a))p(xi\xf ^\&)p(^X2 ^\xf 



We assume that observers update their beKefs on these 
hierarchically-coupled states in a trial-by-trial fashion by applying 
an efficient approximation to ideal Bayesian inference. Under a 
generic mean-field approximation, such update rules have a simple 
and interpretable form: at each level of the hierarchy i, updates of 
beliefs (posterior means) /i,- are proportional to the prediction error 
(5) from the level below, weighted by a precision ratio: 



prediction''=prediction'' '-l-leaming_ratexprediction_erroi^(8) 



This structure is reflected by Eq. 6. 

A key difference to the RW model is that the HGF uses a dynamic 
learning rate that is represented as a ratio of precision estimates, 
where at any hierarchical level /, the numerator represents the 
(likelihood) precision of the prediction at the level below 7t, _ 1 , while 
the denominator contains the precision of the current belief, 71,-. 
What follows from this expression is that prediction errors are given 
a larger weight (and thus belief updates are more pronounced) when 
the precision of the data (input from the lower level) is high or when 
the precision of the prior belief is low. This can be sec-n as an 
analogue to Kalman filtering, in the sense that the precision 
weighting of prediction errors corresponds to the Kalman gain. 

Learning rates. Unlike in the RW model, the learning rate 
modeled in the HGF is dynamic and fluctuates trial-by-trial as a 
result of changes in both informational uncertainty and the 
volatility of the adviser's intentions. Due to the hierarchical model 
structure, we must consider two learning rates: the first learning 
rate varies as a function of perceptual and informational 
uncertainty and is proportioned to the precision (see Eq. (9)). 



1 



(9) 



n) ' 



where 



(6) 



's(k) (k) 

where tJJ/f and n] are precisions of the prediction about input 
from the le\el below and of the belief at the current level, 
respectively. (A note on notation: the superscript ^ denotes the 
"prediction". Hence, refers to the prediction on trial k before 
experiencing the trial outcome, and is the precision of this 
prediction.) 

Rescorla-Wagner (RW) model. Reinforcement learning 
models propose that agents learn to take actions that mciximize 
the probability of future rewards; therefore, agents learn the 
"value" of different stimuli and actions [31]. One of the most 
widely used reinforcement learning model is the RW model [23] 
where predictions about value are updated in proportion to a 
prediction error weighted by a learning rate. The RW model does 
not employ a hierarchy of hidden states, but a single state v (which, 
in our case, describes the subject's estimated value of the advice) 
and one free parameter, the individual learning rate a, which is 
constant across trials: 



(7) 



Structural interpretation of the HGF update equations: 
Analogy to the RW model. The hierarchical Bayesian model 

that we used to fit the data might seem complicated at first glance, 
but it can be reduced (via a variational approximation) to 
analytical update equations that have an easily interpretable form 



,(Af))=.(,f)(l-.(,f)) 



(10) 



Although xi does not depend on its previous state in time but 
results from a sigmoid transform of X2 (see Eq. (2) and Figure 3), 
Eq. (9) transforms the learning rate at the second level into a 
learning rate at the first level (for more details, see Eq. D. 1 in the 
Supplementary Material to [25]). 

Furthermore, there is also a learning rate at the third le\'el, 
which has a more complicated form that depends on the estimated 
mean fi^ of the volatility A'3 of the ad\ iser's intentions and on the 
precision at the third level (for more details, see [22]): 



(t-l)j. 



K I e 



(11) 



where c 



,(*-!). 



Response models 

The response model describes how the agent's beliefs (the result 
of perceptual inference) map onto choices (actions). In our task, 
subjects can integrate social and non-social information, or use 
either source of information exclusively. Specifically, the pie chart 
indicates the true a priori probability c about the outcome as non- 
social information that is directiy accessible to the player without 
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(k) 




Figure 3. Graphical model of the IHGF and the response model. In the graphical model, the diamonds represent quantities that change in 
time (i.e., that carry a time (or trial index k) but that do not depend on their previous state. The hexagons, however, represent states that change in 
time but additionally depend on their previous state in time in a Markovian fashion. Circles, on the other hand, denote fixed parameters, xi 
represents the accuracy of the current piece of advice, X2 the adviser's current tendency to give accurate advice and the current volatility of the 
adviser's intentions. Parameter k determines how strongly X2 and X3 are coupled, oj represents the tonic component of the log-volatility in X2 and .9 
denotes the meta-volatility in x^. The response model has 2 layers: (1) the computation of the integrated belief or p(outcome|cued probability, 
advice), i.e., the probability of the outcome given both the non-social cue and the advice; (2) the chosen action, drawn from the integrated belief 
using a sigmoid decision rule. Parameter ^ determines the weight of the advice compared to the non-social cue. y represents the subject's binary 
response (j = 1: deciding to accept the advice, 3^ = 0: going against the advice). 
doi:1 0.1 371/journal.pcbi.1 00381 0.gOOS 



need for inference. By contrast, the (uncertain) social information 
corresponds to the player's belief that the adviser gives correct 
advice on tin; current trial. In the HGF, this belief /ij 

corresponds to the logistic sigmoid transform of the predicted 

(k—l) 

tendency (the posterior from the previous trial) of the 

adviser to give correct advice (see Eq. (12)): 



(12) 



The response model describes how the player bases his decision 
on a weighted average of the two sources of information. Taking 
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fe[0,l] as the weight of the social information, we obtain the 
integrated behef that the advice is accurate: 

Z,W=(/if)+(l-^)cW (13) 

For the RW model, /tj*^ is replaced by v'*~'). 

The probability of the player following the advice (i.e., making 
decision y=l as opposed to 7 = 0 for going against the advice) is 
then described by a sigmoid function, which maps the unit interval 
[0,1] onto itself for a given decision noise parameter /J>0 (note 
that this function differs from the logistic sigmoid above which 
maps the whole real Une onto the unit interval). 

p(y=m= ^ (14) 

In systematic model comparisons, we compared variations in 
Eqs. (13) and (14), examining whether (i) subjects were more likely 
to integrate social and non-social information or used either source 
of information exclusively, and whether (ii) the decision noise in 
the mapping from beliefs to decisions (i.e., fi in Eq. (14)) was fixed 
or varied in time as a function of the estimated adviser's volatility. 
These variations are detailed in the section on "Model space" 
below. 

Model inversion 

Priors of the model parameters, namely C for all models as well 
as = {k,co,i9} for the HGF and a for the RW model are listed in 
Table 2. We defined the priors based on the experimental design 
and pilot data. For parameters that were stricdy bounded between 
0 and 1, we chose the prior mean to be 0.5. For real-valued 
parameters, we chose prior means that represented values under 
which an ideal Bayesian agent would experience the least surprise 
about its sensory inputs (see the functions tapas_fitModel.m and 
tapas_bayes_optimal_binary_config.m in the HGF toolbox). The 
priors were chosen to be relatively uninformative (with large 
variances) to allow for substantial individual differences in learning 
and advice weighting. 

In the HGF models, we also estimated participants' initial 
beliefs about the advice accuracy and the adviser's volatility, as 
well as their uncertainty about these two quantities. Parameters 
and states are estimated in spaces where they are unbounded. For 
example, parameters confined to the [0,1] interval are log- 
transformed and thus also estimated in an unbounded space. 
Given the priors over parameters and the input sequence, 
maximum-a-posteriori (MAP) estimates of model parameters 
were calculated using the HGF toolbox version 2.1. The code 
used is freely available as part of the open source software 
package TAP AS at http://www.translationalneuromodeling.org/ 
tapas. 

Optimization was performed using a quasi-Newton optimiza- 
tion algorithm [33-36]. The objective function for maximization 
was the log-joint posterior density over all perceptual and 
observation parameters, given the data and the generative model. 
To exclude the possibility that our Gauss-Newton gradient descent 
optimization could have been influenced by local minima of the 
log-joint objective function, we used two additional global 
optimization methods, a Gaussian Process optimization algorithm 
(GPO) [37] and a Markov chain Monte Carlo [38] sampling 
scheme. 



Model space 

Overall, our model space was structured hierarchically, as 
shown in Figure 2. We combined three alternative perceptual 
models with four potential response models, c:onstituting a total of 
12 models. Mi . . . M12, which are described in more detail below. 

Although the assumptions of hierarchically coupled-learning 
were well founded, we also considered that participants' decisions 
could be explained by simpler non-hierarchical models. To 
examine this hypothesis, we included two model classes, which 
were both non-hierarchical. The first was a simplified version of 
the HGF (M7 . . . M9), in which the volatility at the third level was 
fixed to its prior mean and did not evolve over time (see Table 2 
for the prior values used). This model assumed that participants 
ignored the instructions that the advisers' intentions might change 
in time, expecting negligible changes in log-volatility at the third 
level. The second model class was the classical RW model 
(Mio . . . M12), which assumed a fixed learning rate. 

Concerning the response models, the key question was whether 
participants integrated social and non-social sources of informa- 
tion or relied exclusively on one of the two sources of information. 
Herein, we included two (reduced) response models, which 
proposed that participants considered either the advice alone or 
the cue alone when predicting the outcome. The first model was 
defined by setting { to 1 (see Eq. (15) and Table 2), whereas the 
second only included the displayed winning probabilities with C, 
frxed to 0 (see Eq. (16) and Table 2). 

i.w=AS*' (15) 



M*)=cW (16) 

Notice- that the latter r(;sponse model is not coupled to any of 
the perceptual models, bec:ause it suggests that participants do not 
learn about the validity of the advice and intentions of the adviser: 
on the contrary, they base their decisions only on the displayed 
winning probabilities. 

Furthermore, we assessed two potential mechanisms of belief-to- 
response mapping (see Eq. 14), by including models which either 
assumed that (i) participants responded in accordance to their 
belief about advice accuracy but tainted by decision noise 
("Decision noise" model family for models M4...M12), or that (ii) 
participants' decisions were based on their estimates of the 
volatility of the adviser's intentions ("Volatility" model family for 
models Mi,M2,M3). 

The "Decision noise" model refers to parameter p in Eq. (14), 
which represents the inverse of the decision temperature: as 
)S-»oo, the sigmoid function becomes steeper, approaching a step 
function (no decision noise) at b = 0.5. By contrast, the "Volatility" 
model family contains a time varying mapping of beliefs onto 
decisions. In this model set, the decision temperature parameter /} 

varies with the estimate of adviser volatility or e ''3 . Hence, as the 
estimated volatility of the adviser's intentions decreases, the 
sigmoid function becomes steeper. This predicts that on trials 

when the player infers that the adviser's intentions are stable, he 
responds in accordance to his beliefs. As the volatility increases, 
the player becomes more uncertain of the adviser's intentions, and 
thus behaves in a more exploratory manner, resulting in a noisier 
mapping of belief-to-response probabilities. It is important to note 
that the mapping of beliefs onto actions is updated trial-wise, 
unlike in the case of the "Decision noise" response models, in 
which the link from beliefs to decisions is determined by a fixed. 
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subject-specific parameter j6. Please see the video SI for a 
demonstration of how the states of the perceptual model map onto 
decisions using equations (13) and (14), given all possible ranges of 
response model parameter 4. 

Thanks to our factorial model space, we used family-level 
inference [39] to (i) determine the most likely class of perceptual 
models pooling across all response models, and (ii) the most likely 
response model class pooling across all perceptual models. 

Bayesian model selection and family inference 

Before inferring on the model parameters, we evaluated the 
model space using Bayesian model selection (BMS). This 
procedure rests on computing an approximation to the model 
evidence oy p( v\m) the. jjrobabilit)' of the data 3; given a model ni 
[40]. The model evidence is the integral of the log-joint over the 
entire parameter space, which cannot be evaluated analytically. 
However, one can approximate the log model evidence with a 
lower-bound, the so-called (negative) free energy F. 

Alternative models can then be compared via the ratio of their 
respective evidences, i.e. their Bayes factor or equivalendy, the 
dilference in their log-evidences. At the group level, a group Bayes 
factor (GBF) can be computed by multiplying Bayes factors across 
subjects. The disadvantage of this procedure, however, is that it 
rests on the fixed-effects assumption that all participants' data are 
generated by the same model and variation is simply due to 
measurement noise [41]. This is not appropriate for our paradigm, 
as it emphasizes incUvidual difierences in social learning (e.g., by 
letting advisers choose their own strategy). This requires a 
random-effects BMS approach, where the model becomes a 
random variable in the population. 

The random-effects BMS approach we use here rests on a 
hierarchical scheme introduced by Stephan et al. (2009), which 
estimates the parameters of a Dirichlet distribution of the 
probabilities Vk of all models considered; in turn, these 
probabilities inform a multinomial distribution over model space. 
This makes it straightforward to compute the posterior proba- 
bility that a given model generated the data for any randomly 
selected participant, relative to all other models considered (for 
details see [41]). Similarly, one can compute the "exceedance 
probability" that a particular model is more likely than any other 
model in the comparison set. In other words, the exceedance 
probability represents the amount of evidence that, in the 
population studied, a given model is more frequent than the 
others. 

In case no model really stands out as a "winner" (i.e., no high 
exceedance probability), we can partition the model space, pool 
evidence over subsets of models that share a common feature (e.g., 
with and without a hierarchical level) and thus compare model 
subspaces or families, instead of single models (see [39] for details). 
This idea is essentially similar to factorial experimental designs in 
psychology where data from all cells are used to assess the strength 
of main effects and interactions. It amounts to specifying a 
partition F, which splits the entire model set into k= I...K subsets 
(model families). The subset fk contains all models belonging to 
family k where there are A^^ models in the A:* subset. Due to the 
agglomerative property of the Dirichlet distribution, for any 
partition of model space into families, it is straightforward to define 
a new Dirichlet density reflecting this split (see Eq. 18 in [41]). The 
family probabiUties are then given by: 

Sk='^r^ (17) 



where Sk is the probability of each family occurring in the 
population. Exceedance probabilities (j)/^ can then be computed for 
each family, in the same way as for single models. They 

correspond to the probability that family k is more frequent than 
any other family (of all K families considered), given the data Y 
from all subjects: 

^,=p{sk>sj\Y,yj^k) (18) 

Because the conditional model probabilities sum to one over 
aU models considered, this equation becomes particularly intuitive 
when model space is spUt into 2 families: 

(l>i=pisi>S2\Y)=p(si>0.5\Y) (19) 



Simulations 

To test the performance of BMS for our particular case, we 
generated 20 datasets for each of the 4 perceptual models 
considered under realistic levels of decision noise and using the 
prior means as parameter values. Thus, we augmented the 
softmax function in Eq. (14), which describes the mapping of 
beliefs onto actions, with a noise parameter r] = 0.5: 

i,exp{log{P)-ri) 

p{y=m = , (20) 

We then performed model inversion using the quasi-Newton 
optimization algorithm (in the same way as for the other models in 
this paper) and summarized the performance of BMS in terms of a 
confusion matrix (Figure SI). This matrix depicts the frequency of 
"correct cases" (where the model which generated the data has the 
highest exceedance probability of all models tested); here, rows 
denote the model that generated the data, and columns the model 
that was inverted. Thus, off-diagonal matrix elements indicate the 
frequency with which one generative model is "confused" with 
another. 

Results 

Examining the robustness of model inversion and 
comparison 

As described in the Methods section, we examined the 
robustness of our BMS results in two ways. First, we used three 
different optimization schemes for inverting subject-wise models 
(quasi-Newton, MCMC, Gaussian process optimization). As 
shown by Table 3 (which lists the posterior probability of all 12 
models under each optimization scheme), BMS results were 
consistent across all schemes. Second, the simulation results 
suggest that in the large majority of cases, the perceptual models 
that generated synthetic data could be recovered by model 
selection (Figure SI). 

Do subjects exploit volatility estimates of the advisers' 
intentions to dynamically update estimates of advice 
accuracy? 

When comparing all 12 models against each other, random 
effects BMS showed that the three-level HGF augmented by the 
"Volatility" response model (Mi) performed significantiy better 
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than the rest of the models in the majority of participants 
(exceedance probability (j) = 0.99; Table 4 and S2-S3). Across the 
perceptual model family, the three-level HGF family (Mi . . . Ms) 
outperformed non-hierarchical models (M7 . . . M12) including the 
reduced HGF and RW models = 0.99; Table 5; Figure 4a). 
Taken together, these findings indicate that participants infer on 
two quantities, the advice accuracy and the volatility of the 
adviser's intentions, and incorporate the time-varying volatility 
estimates of the advisers' intentions into their learning about the 
advice. 

The HGF quantifies subject-specific learning rates at distinct 
temporal scales. As an example. Figure 5 contains the learning 
rates for one individual subject. Here, the learning rate at the 
second level (transformed according to Eq. 9) increases as the 
reliability of the advice unexpectedly changes (blue line in 
Figure 5). The learning rate at the third level (green line; see Eq. 
1 1), however, fluctuates more slowly, and increases when the 
adviser's intentions change from being consistendy helpful to 
being misleading. This illustrates that the learning rate adapts to 
fluctuations in trial-by-trial advice reliability as well as to slower 
fluctuations in adviser intentionality. By contrast, the RW model 
assumes a constant learning rate a over trials. Figure 5 contrasts 
this estimate of a with the trial-wise learning rates provided by the 
HGF. This comparison suggests that, in a volatile environment, 
such a constant learning rate is necessarily too high on many 
trials. 

Do subjects integrate social and non-social sources of 
information, or use one source of information 
exclusively? 

The family of response models proposing that participants 
integrate both social and non-social sources of information (i.e., 
Mi,M4,M7,Mio) best explained participants' choices ((^ = 0.99; 
see Figure 4b and Table 6). That is, to predict the winning color, 
most participants relied on both the uncertain advice and the 
known outcome probabilities indicated by the visual pie chart. 
However, the posterior parameter estimate of f was, on average, 
significanfly smaller than 0.5 (/7<0.05; see Table 7), suggesting 
that participants weighted the visual pie chart information more 
than the advice. 

What drives trial-by-trial variability of decisions - 
Estimates of volatility or general decision noise? 

As explained above, we considered two possibilities vis-a-vis 
how the player's beliefs might determine his actions trial-by-trial. 
A standard approach is to use a sigmoidal function (softmax or 
exponentiated Luce choice rule; see Eq. 14), which conveys 
decision noise whose amount is fixed across trials. This approach is 
used in our "Decision noise" response models with a fixed, subject- 
specific parameter fS. Alternatively, however, decision noise might 
vary dynamically across trials as a function of higher-order beliefs, 
such as the player's estimates of the adviser's volatility. This is 
represented in our "Volatility" response model family, which 
postulates that when the player estimates the adviser's intentions to 
be stable, he responds in close accordance to his beliefs. On the 
other hand, when the player's estimates of the adviser's volatility 
increase, he behav(;s in a more- exploratory manner, resulting in a 
less deterministic (noisier) link between beliefs and responses. Our 
model comparisons indicated that the second perspective provided 
a better account of the data: the "Volatility" response model 
family clearly outperformed the "Decision noise" model family 
(p{r\y) = 0.94; (l) = 0.99; Figure 4c). 



Do the model parameter estimates predict other aspects 
of behavior? 

We used both classical multiple regression and variational 
regression to examine whether model parameter estimates of the 
winning model (Mi) predicted scores of relevant psychological 
traits, as measured by questionnaires, which the subjects complet- 
ed three days prior to the experimental session. Model parameter 
estimates K and oj predicted players' scores on the IRI (R^ = 50%, 
F = 6.03, /><0.02; log Bayes Factor (full versus nuU mod- 
el) = 15.16; Table 8). Thus, participants with a stronger tendency 
to take into account the perspective of others during social 
interactions showed (i) stronger coupling between inference on 
advice accuracy and adviser volatility and (ii) more stable belief 
updates about advice reliabilit)' (and adviser trustworthiness) 
(Figure 6a). Notably, this link between parameter estimates and 
independent questionnaire scores for model Mi was absent for the 
other models (p = 0.08 for the HGF with "Decision noise" and 
/7 = 0.1 7 for the RW model). 

Performance accuracy averaged at 73% ±5% (mean ± standard 
deviation), indicating that, on average, the players reached the 
silver target and received CHF 10 bonus payment at the end of the 
game. Perceptual model parameter 9 and response model 
parameter C predicted participants' performance accuracy 
(R^ = 40%, F = 9.41,p<0.01; log Bayes Factor (full versus null 
model) = 17.59). Taken together, these results reflect that partic- 
ipants who perceived the adviser's intentions to be more stable and 
who weighted the social information more during decision-making 
performed better in the task (Table 8; Figure 6b-c). 

Does the player's behavior change with helpfulness of 
the adviser? 

Advisers also reached, on average, the silver target and received 
CHF 10 payment at the end of the game. Across advisers, their 
recommendation was correct in 74"'(i±9.8"(i of all trials. 

On debriefing, 4 of the 16 advisers reported a general intention 
to help the players during the task; these advisors provided correct 
recommendations on 85%±9.2% of the trials (note that the 
information available to the advisers predicted wins with 80% 
accuracy). The majority of the advisers (9 out of the 16), however, 
aimed to increase their final pay-ofF and provided correct 
recommendations on only 74% ±1.6% of the trials. On the one 
hand, the players who interacted with more helpful advisers 
weighted the advice more as indexed by larger [ values and 
perceived the advisers' intentions as more stable as indexed by 
reduced i9 values {p<0.01; Figure 6c-d). On the other hand, the 
players who interacted with advisers whose intentions changed 
over the course of the game exhibited significanfly larger k values 
than the rest (Figure 6e). This suggests that there was a more 
pronounced coupling between the two learning levels (advice 
accuracy and adviser volatility) during interactions with advisers 
whose intentions were changing. 

To demonstrate the interpretabHity of our model parameter 
c-stimatcs, we asked each player eight times at random points 
during the game to explicitiy rate the advisers as "helpful", 
"uninformative" or "misleading". These ratings were coded such 
that "helpful" corresponded to a probability of accurate advice of 
1, "uninformative" to a probability of 0.5, and "misleading" to a 
probability of 0. To relate the participants' ratings to the estimates 
of advice reliability as inferred from the model, we used each 
player's ratings as the outcome variable in a general linear model 
with the explanator).- variable being the prediction about the 
advice reliability (p-i). This proved to be highly significant (t = 5.92, 
p < 0.0002) in a second level random effects regression analysis (see 
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Table 3. BMS results across optimization schemes (posterior model probability p{r|y) of all models). 



Optimization Algorithm 


Response Model 


HGF with 
Volatility 


HGF with 
Decision Noise 


No Volatility 
HGF 


Rescorla-Wagner 


1. GN 


Integrated: Cue and Advice 


0.78 


0.0274 


0.0169 


0.0166 




Reduced Advice 


0.0419 


0.0169 


0.0165 


0.0171 




Reduced Cue 


0.0167 


0.0166 


0.0168 


0.0165 


2. GPO 


Integrated: Cue and Advice 


0.6915 


0.0747 


0.0408 


0.0190 




Reduced Advice 


0.0165 


0.0167 


0.0279 


0.0170 




Reduced Cue 


0.0207 


0.0178 


0.0399 


0.0177 


3. MCMC 


Integrated: Cue and Advice 


0.8173 


0.0165 


0.0165 


0.0167 




Reduced Advice 


0.0164 


0.0167 


0.0169 


0.0163 




Reduced Cue 


0.0168 


0.0163 


0.0164 


0.0172 



doi:1 0.1 371 /journal.pcbi.l 00381 0.t003 



Figure S4). In brief, the state estimates of our model correspond 
well to the players' overtly expressed beliefs about the adviser's 
intentions during the game. Notably, the same analysis using the 
value of the advice as estimated by the RW model did not yield 
significant results (p = 0.43). Altogether, this corroborates our 
model comparison results and provides construct validity for our 
model. 

Is behavior on the control task governed by different 
mechanisms? 

Distinct learning performance was observed in the control task 
(where the adviser was blindfolded and presented his advice by 
holding up a card sampled from a series of card decks, each of 
which was, on average, either 80% or 20% accurate). The players 
performed significantly worse in this task compared to the socially 
interactive task (t (15) = 5.48, /)<0. 00001), with performance 
accuracy averaging at 64% ±2.6%. 

In this task, the BMS yielded different results compared to the 
socially interactive task (see Table 9). More precisely, the three- 
level HGF family (Mi . . . Me) still outperformed non-hierarchical 
models (M7 . . . M12), such as the reduced HGF and the RW 
model {(j> = 0.98), suggesting that participants did incorporate time- 
varying estimates of volatility (resulting from the switches among 
card decks) into their beliefs about the advice accuracy. 
Furthermore, the integrated response model family 
(Mi,M4,M7,Mio), which proposed that participants weigh both 
social and non-social sources of information, explained partici- 
pants' responses better than reduced response models 



(M2,M3,M5,M6,M8,M9,Mii,Mi2) according to which subjects 
relied on one source of information only {(j) = 0.99). In contrast to 
the social setting, in the control task, the response model 
prescribing volatility-driven mapping of beliefs to decisions did 
not differ from the model that utilized a single decision noise 
parameter /J((^ = 0.54). In other words, unlike in the social task, 
decision noise might not change across trials as a function of 
adviser volatility estimates. 

With respect to the posterior parameter estimates (see Table 7), 
there were notable differences between the two tasks: In the 
control task, parameter ( averaged at 0.28±0.11; this was 
significandy lower than in the social task (t (15) = 2.44, ^<0.02), 
indicating that the players weighted the social (but unintentional) 
advice significantly less than in the social task. That is, although 
the card decks were more informative (80% predictability of wins/ 
losses) than the non-social cue (55-75% predictability), the players 
relied more on the binary lottery information to predict the 
outcome. The difference in the performance and the response 
model parameters of these two tasks suggests that participants 
performed better and relied more on the adviser's recommenda- 
tions when the adviser intentionally issued the advice. 

Differences in advisers' strategies and players' individual 
learning trajectories 

In each participant, the model parameters describe an 
individual learning trajectory (see Figure 7). As we debriefed each 
adviser explicidy about the strategy that he employed during the 
task, we were able to use these debriefings to examine how model- 



Table 4. Bayesian model selection results (social interactive condition): Posterior model probability or p{r\y) and the model 
exceedance probability or (j). 



HGF with 

HGF with Volatility Decision Noise No Volatility HGF Rescorla-Wagner 

Integrated p{r\y) 0.7800 0.0274 0.0169 0.0166 

<l> 0.9975 0.0001 0 0 

Reduced: Advice p{r\y) 0.0419 0.0169 0.0165 0.0171 

<l> 0.0005 0 0 0 

Reduced: Cue p{r\y) 0.0167 0.0166 0.0168 0.0165 

<l> 0 0 0 0 

doi:10.1371/journal.pcbi.l003810.t004 
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Table 5. Family-level inference (perceptual nnodel set): Posterior model probability or p{r\) 


) and model exceedance probabilities (j). 




HGF with Volatility HGF with Decision Noise No Volatility HGF 


Rescorla-Wagner 


p{r\y) 0.7298 0.1091 0.1066 


0.0545 


(j) 0.9969 0.0024 0.0005 


0.0002 


doi:l 0.1 371 /journal.pcbi.l 00381 0.tOOS 



based quantification of individual learning and inference reflected 
the players' adaptive responses to the advisers' behavior during the 
game. 

Four out of the 16 advisers provided accurate advice throughout 
the game with advice reliability averaging at 85%±9.2%. Upon 
debriefing, they reported that they aimed for the silver range from 
the beginning, as they deemed it fair for both participants to reach 
the silver target. The players who interacted with this subset of 



advisers perceived their intentions to be stable over time and 
weighted the advice more, as indexed by larger f values (see 
Figure 6d). An example of such a player is given in Figure 7a 
(subject SL_010), where the trajectory of estimated advice 
reliability fii indicates that this player's estimate of advice accuracy 
stayed close to 90% throughout the game. For this subject, the 
estimate of C was 0.54, indicating that he relied more on the advice 
than the non-social cue when making his predictions. 




Volatility 



Decision Noise 



Figure 4. Random effects family-level Bayesian model selection. (A) Posterior model probabilities pooled across all families of perceptual 
model families (i.e., HGF Volatility, HGF decision noise, No Volatility HGF and RW) indicate that the model class "HGF Volatility" explains participants' 
responses best. (B) Posterior model probabilities pooled across all response model families (i.e., Integrated (Cue and Advice), Reduced: Advice Only, 
and Reduced: Cue Only) indicate that the "Integrated" model family explains participants' responses best. (C) Posterior model probabilities across 
models that propose that the mapping of beliefs onto response probabilities is achieved via trial-by-trial adviser volatility estimates (Volatility models) 
or decision noise (Decision Noise). The former was the winning family. 
doi:1 0.1 371/journal.pcbi.1 00381 0.g004 
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Figure 5. Learning rates and tlie estimated advice accuracy. (A) The estimated probability of the advice accuracy is computed according to 
the HGF and the RW perceptual models. In the RW model, the probability of advice accuracy is over- or under-estimated (compared to the HGF) on 
trials where the volatility of the adviser's intentions changes; this is due to the constant learning rate a. (B) The learning rates modeled in the HGF 
(according to equations 9 and 1 1) change over time as a function of the volatility of the adviser's intentions. 
doi:10.1371/journal.pcbi.1003810.g005 



By contrast, another 3 of the 16 advisers were consistently 
uninformative, exhibiting an average of only 56%±6.6% advice 
reliability. Indeed, when they were debriefed, they described that 
in order to reach the gold range, they attempted to confuse the 
player from the beginning, preventing his progress bar to increase 



significantly throughout the game and maximising their own 
chances to reach gold. This turned out to be a successful strategy, 
as the advisers who used this strategy were the only ones who 
reached the gold target. The players who interacted with this 
subset of advisers showed a very distinct trajectory of learning from 
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Table 6. Family-level inference (response nnodel set): Posterior model probability or p{r\y) and model exceedance probabilities <l>. 



Integrated Reduced: Advice Reduced: Cue 

p{r\y) 0.8740 0.0731 0.0529 

(j) 0.9998 0.0004 0.0004 

doi:l 0.1 371 /journal.pcbi.l 00381 0.tOOe 



those discussed above. First, [ was low in all these players 
averaging at 0.19, indicating that they (rightfully) assigned little 
weight to the advice and relied exclusively on the pie chart. A 
representative subject is shown in Figure 7b (subject SL_005), for 
whom the estimate of f was 0.12. Furthermore, a high value of 
meta-volatility 5 indicated that this participant perceived the 
adviser's intentions as becoming more and more volatile over the 
course of the interaction. Additionally, high levels of oj indicated 
fast updating of beliefs about advice correctness over trials and 
independendy of estimates of adviser volatility. Again, this is a sign 
of adaptive behavior: because the adviser is consistently uninfor- 
mative (random) in his advice, a high tonic learning rate for 
updating estimates about advice validity in X2 is appropriate. In 
other words, this scenario describes an agent who perceives the 
adviser's intentions as stochastic because the advice is uninforma- 
tive. In this scenario, the participant necessarily performs poorly 
because he must base his decisions almost exclusively on the non- 
social cue. 

The largest subset of advisers (9 out of 16) used a strategy, 
which reflected the change in incentives induced by the payoff 
scheme: advisers were helpful at the beginning of the game until 
the players' progress bar reached their gold range. From this 
point on, advisers began to mislead the players, preventing them 
from moving beyond the gold range. Once the players detected 
this change in intentionality, their score began increasing again. 
This elicited another switch in the advisers' strategy, who now 
resumed a helpful attitude in order to at least reach the silver 
range. On average, the recommendations of these advisers were 
74% ±1.6% accurate. Players who interacted with these advisers 
exhibited a learning trajectory that reflected the advisers' 
dynamic incentives (see Figure 7c). For example, for the 
representative subject shown in Figure 7c (subject SL_013), fii 
steadily reached 0.9 after the first 80 trials, with a concomitant 
decrease in estimates of the adviser's volatility ^^3. Once the 
adviser's intentions changed, the player updated his beliefs 
accordingly, as reflected by an increase in the learning rate in X2 
and larger updates of /ij . In this scenario, the player's adaptive 
behavior takes into account both the volatility of the adviser's 
intentions and the accuracy of his advice. This is reflected by high 
values of the estimates for K, co, and i9. 

Finally, to illustrate the capacity of our model-based approach 
for characterizing individual differences, we show an unusual 
subject in Figure 7d (subject SL_015). This player did recognize a 
change in the adviser's intentions halfway through the game but 
was much slower in updating his estimates of advice accuracy and 
adviser's volatility than the subject discussed above. This is 
because his prior behefs were close to how the adviser actually 
behaved for the first half of the game and because low estimates of 
meta-volatility prevented a rapid response to the change in the 
adviser's intentions. In other words, this participant remained 
relatively confident about his prior estimate of the adviser's 
volatility and expected to see litde change over the course of the 
social interaction. 



Discussion 

The question of how we infer on others' intentions is a 
fundamental computational problem during social transactions. 
To examine this process, we extended a paradigm introduced by 
[14], turning it into an interactive social decision-making game in 
which each participant was assigned to a "player" or an "adviser" 
role. Critically, the game was designed to ensure that the adviser's 
incentives to cooperate or deceive the player varied, thus making 
his intentions volatile. 

While our paradigm was inspired by the previous work of 
Behrens and colleagues [14,42], it introduced two important 
advances. First, whereas Behrens et al. made subjects believe that 
the computer-generated advice was provided by a human being, 
our paradigm used real participants without any deception. This 
provides ecological validity and eschews potential ethical concerns, 
which makes the recorded trials from this paradigm more widely 
applicable, e.g., for future patient studies. Secondly, our paradigm 
allows for a wide range of interactions between agents, as both the 
adviser and the player are not restricted to employ specific 
strategies during their interaction. The player can rely on the 
binary lottery information (the non-social cue), the advice, or both 
when selecting his choices. Furthermore, on every trial, the adviser 
can also choose to provide either helpful or misleading advice, 
depending on whatever strategy he may be employing; in turn, 
these differences in strategy across advisers elicit differences in the 
adaptive behavior of the players. 

To explain the ensuing variability in adaptive behavior across 
subjects, we modeled the players' learning using a systematic set of 
alternative models that factoriaUy combined different models of 
learning behavior ("perceptual models") and decision-making 
("response models"). Using Bayesian model selection, we demon- 
strated that a hierarchical Bayesian model (the hierarchical 
Gaussian filter, HGF) with three levels best described the players' 
learning in the task. This suggests that participants updated their 
beliefs about advice reliability depending on an ongoing estimate 
of the volatility of the adviser's intentions, and that this estimate of 
volatility directly informed the trial-wise decisions. This three-level 
HGF outperformed simpler non-hierarchical models (such as 
Rescorla-Wagner), indicating that during social exchanges, par- 
ticipants employ a multilevel model of their environment and are 
capable of learning how others' intentions to be helpful or 
misleading fluctuate over time. These higher-order expectations 
are in turn exploited to update trial-by-trial predictions about 
advice reliabihty. 

An important contribution of this paper is the translation of a 
recent Bayesian framework for comparing alternative cognitive 
models [15,16,22] to die domain of social interactions. The 
implementation of this framework in the present study, however, 
has one significant limitation: The present models aimed to 
explain only the players' learning during the game, and not the 
advisers'. That is, they neglected the recursive process of 
perspective-taking (in other words, the player's belief about the 
adviser's belief about the player's belief etc.), which occurs in many 
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Table 7. Average nnaximum a posteriori estimates of the free parameters in the winning models of the social and control tasks. 



Social Interactive Task 



Model: HGF with Volatility (M|) 



Control Task 



Model: HGF with Decision Noise (M4I 



Model Parameters 


Mean 


SD 


Model Parameters 


Mean 


SD 




0.48 


0.53 




0.37 


0.53 




1.06 


0.27 




1.11 


0.62 




0.42 


0.61 




0.97 


0.02 




1.05 


0.13 




1.00 


0.001 


K 


0.31 


0.29 


K 


0.18 


0.05 


0) 


-5.92 


2.93 


(0 


-5.84 


2.55 




0.44 


0.27 


9 


0.47 


0.06 




0.39 


0.12 




0.28 


0.11 


IS 


4.86 


1.86 




6.33 


2.83 
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social situations. Recent studies (see [2,12,13,43]) used recursive 
theory-of-mind models to explain social inference in cooperative 
or multi-round trust games. These models propose that the 
expected value of a given action (e.g., to cooperate or to compete 
and to choose equitable or unfair offers) is a function of the other 
agent's strategy. Thus, players optimize their strategies or their 
depth of recursive reasoning by taking into account their 
opponents' future actions. For example, Yoshida and colleagues 
(2010a) showed that players who employ higher-order strategies, 
which take into account the opponents' future actions, forgo 
immediate rewards for options that lead to higher pay-off but 
require multi-player cooperation. 

One important future direction of our work is to extend the 
modeling of hierarchically coupled beliefs to take the depth of 
recursive perspective-taking into account. Having said this, the 
recursive depth of social inferences is typically limited [2,13]. For 
example, Xiang et al. (2012) classified the depth of subjects' 
reasoning during an investor-trustee game. Approximately half of 
195 investors were classified as strategic level 0 players, suggesting 
that they do not simulate their partner's play, while the other half 
employed either level 1 or level 2 depth-of-reasoning. Yoshida et 
al., 2010a also reported significant variability in recursive depth 
across individuals. In the present study, as described in the Results 
section, more than half (9/16) of the advisers reported to have 
adopted a fixed strategy (either consistently helpful or consistentiy 
random throughout the game) and are thus unlikely to have 
engaged in recursive perspective-taking. For the remaining 
advisers, it is possible that modehng their belief updating processes 
(in addition to those of the players) would lead to an even better 
prediction of the players' behavior. We will test this possibility in 
future work, contrasting models with and without the represen- 
tation of recursive interactions. 

As it did not incorporate recursive perspective-taking, our study 
focused on modeling the downstream consequences of the 
differential strategies that advisers employed. The players' belief- 
updating process reflected the advisers' poKcy and determined 
how much they were willing to take the advisers' suggestions into 
account during decision-making. We found that players who 
interacted with consistently helpful advisers perceived their 
intentions to be stable over time and thus weighted their advice 
more when predicting the outcome, as reflected by reduced values 
in the meta-volatility parameter i9 and larger f values, respectively. 



Furthermore, players who interacted with advisers, whose 
intentions were changing to maximize their own winnings, showed 
more pronounced K values. This result suggests that the two 
hierarchical learning levels were more strongly coupled in this 
subset of participants, and that the volatility estimate was used to 
update the beliefs about advice accuracy. Unlike in the case of 
consistently misleading advisers, in this particular social exchange, 
the volatility of the advisers' intentions was more traceable. Thus, 
players could benefit from inferring on the volatility of the 
advisers' intentions to predict the advice accuracy. 

Beyond reflecting the adviser's policy in the parameter 
estimates, our model exhibited construct validity in two ways: 
First, its posterior parameter estimates predicted participants' 
scores on the IRI, a questionnaire, which they completed prior to 
participation in the study. Players, who described themselves as 
proficient perspective-takers, exhibited a more stable model of the 
adviser as reflected by the less pronounced tonic component of die 
learning rate. Second, the model's posterior parameter estimates 
also predicted the participants' explicit ratings of the advisers' 
helpfulness throughout the game. Notably, this relationship was 
specific for the hierarchical Bayesian model while the parameter 
estimates from the competing RW model did not show this 
predictive capacity. 

As described above, model comparison indicated that the 
participants' behavior was best explained by a hierarchical model 
in which estimates of volatility (of the adviser's intentions) played a 
key role for belief updating. Furthermore, beyond inference and 
with respect to the translation of behefs to decisions, we found that 
a response model in which participants' estimates of the volatility 
of the advisers' intentions determined their trial-wise decisions 
explained participants' choice behavior best. That is, the mapping 
from beliefs to choices was increasingly deterministic the more the 
player considered the adviser's intentions to be currently stable. By 
contrast, when the player's estimates of the adviser's volatility 
increased, the relation between beliefs and decisions became more 
stochastic and the player exhibited a more exploratory behavior. 
This result demonstrates the direct relevance of volatility estimates 
for determining trial-by-trial variability of decisions; note that this 
is distinct from (and complementary to) our findings on the role of 
volatility for learning and inference, described in the context of 
comparing different perceptual models above. The finding that 
volatility is an important factor determining trial-by-trial choice 
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Figure 6. Construct validity of model parameters. The perceptual model parameter k and to (A) predicted players' self-report scores on the 
Interpersonal Reactive Index (IRI). The perceptual model parameter ,9 and response model parameter predicted players' performance accuracy (B 
and C). Additionally, the perceptual model parameter k and .9 and response model parameter also predicted the strategy of the advisers with whom 
players interacted (C-E). 
doi:1 0.1 371 /journal.pcbi.1 00381 0.gOOe 



variability goes beyond previous studies, which examined the 
impact of volatility with respect to inference only (e.g., 
[14,22,25,44]). Moreover, in the context of social learning, these 
results stress the deployment of a hierarchical model and a key role 
of volatility estimates for both inference and decision-making. 
These are important in that they complement concepts of social 
learning, which emphasize the role of simple heuristics (e.g., [45]) 
or refer to non-hierarchical reinforcement learning (e.g., [46]). 

Similar to what Behrens and colleagues (2008) observed, we 
found that participants did not base their decision on a single 
source of information, but integrated the advice with information 
from the visual pie chart, which was also probabilistic but had a 
known outcome distribution. That is, the uncertainty of the 
information provided by the pie chart was directiy given on each 
trial, whereas the uncertainty of the advice had to be estimated 
online. This can be related, to some degree, to the distinction 
between risk and ambiguity [47-49]. 



Our modeling results show that participants were able to 
trade-off between these different forms of uncertainty depending 
on the type of adviser they faced (see Figures 6d-e and 7): when 
interacting with generally helpful advisers, most players consid- 
ered the advice strongly because, on average, it was more 
accurate than the visual pie chart. However, when they 
interacted with advisers who deliberately showed consistently 
uninformative (random) behavior, participants tended to 
discount their recommendation and relied more strongly on 
the visual pie chart. This is remarkable since it means that 
players did not display a uniform tendency to avoid ambiguity; 
instead, ambiguity aversion was restricted to interactions with 
an unhelpful adviser. 

Additionally, we found that the different sources of information 
(cue and advice) did not receive equal weight during decision- 
making. Consistent with previous findings [50], we observed that 
participants relied more on the non-ambiguous information (i.e.. 



Table 8. Predictive validity of nnodel parameters: (a) Perceptual model parameters k and co predicted participants' IRI scores. 
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Table 9. Results of Bayesian model selection (control condition): Posterior model probability or p{r\y) and model exceedance 
probabilities ij). 
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Figure 7. The learning trajectories about advice accuracy and adviser volatility in several representative participants. (A) Subject 
SL_010 interacted with a consistently helpful adviser; the parameter estimate of ^ was 0.54, suggesting that he took into account the advice more 
than the non-social cue. (B) Subject SL_005 interacted with a consistently misleading adviser; the estimate of C, was 0.12, indicating that he relied 
almost exclusively on the non-social cue when predicting the outcome. Additionally, high levels of i9 indicate that the player perceived the advice as 
highly volatile over the course of interaction. (C) Subject SL_013 interacted with an adviser who provided helpful advice at the beginning of the 
game, and then changed his strategy half-way through the game offering misleading advice. This player adapts to changes in his environment, i.e., to 
the advice accuracy and the adviser volatility. (D) Subject SL_015 interacted with an adviser who employed a similar strategy as in (C); however, the 
estimate of to was significantly reduced suggesting that the player did not greatly change his beliefs during the interaction because his prior 
estimates were consistent with the adviser's actual strategy. 
doi:1 0.1 371/journal.pcbi.1 00381 0.g007 
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the non-social cue) compared to the advice. Previous models 
[51,52] describing how people integrate social and non-social 
sources emphasized the importance of ambiguity that is intrinsic to 
social exchange: We are uncertain about how uncertain our 
appraisal of the other agent's intentions is. 

Previous work on uncertainty in repeated advice taking showed 
that, surprisingly, decision-makers do not become more confident 
in their choices with increasing advice accuracy [53]. Although we 
did not exphcitly ask subjects to rate confidence or uncertainty, 
our modeling results did take into account how their inferred 
estimates of uncertainty (about the adviser's intentions) informed 
their trial-wise decisions. 

Furthermore, analysis of a control condition in which trial-wise 
advice was randomly sampled (by a blindfolded adviser) from 
several decks of cards with either 80% or 20% accuracy suggested 
that participants relied more on the advice when it was intentional, 
as opposed to unintentional. This behavior was observed even 
though it was perfectly possible to extract predictive information 
from the card decks with the same accuracy as from a helpful 
adviser. Since players based their decisions more on the visual pie 
chart and did not take advantage of the advice, their performance 
was significantly lower than in the social condition. 

Beyond the results per se and their implications for concepts of 
social learning, the modeling approach in this paper, with its 
emphasis on inter-individual variability in inference and decision- 
making, may serve useful for future studies of social learning. To 
facilitate this, the HGF and the BMS routines are freely available 
as open source MATLAB code (the HGF can be found at www. 
translationalneuromodeling.org/tapas; the BMS routines are part 
of the SPM software package: www.fil.ion.ucl.ac.uk/spm). 

Finally, we believe that the approach presented here has 
potential for characterizing mechanisms of maladaptive behavior 
in individual patients. The present study in healthy volunteers 
provides a proof of concept how individual mechanisms can be 
elucidated in the context of social interactions, a domain where 
many psychiatric disorders, including schizophrenia, are charac- 
terized by particularly salient deficiencies [54] . For example, many 
patients with schizophrenia exhibit a negative attribution bias 
about others' intentions, which reflects the finding that negative 
information is perceived as more diagnostic of another person's 
true character than positive information [55-57]. One attractive 
option is to use models as the one described in this study for 
computational phenotyping of patients from heterogeneous 
disorders [58-60]. For example, patients may show a diminished 
ability to dynamically infer on the intentions of others for different 
reasons: they may have overly tight prior beliefs about others' 
motivations, or they may suffer from an abnormality in belief 
updating mechanisms, which in turn could be due to aberrant 
computations of prediction error, precision or both (see Eq. 6 
above). In other words, models of cognition such as the one 
introduced here and in previous studies may prove useful to 
propose potential nosological dimensions with mechanistic inter- 
pretabHity and disambiguate alternative mechanisms in individual 
patients through model selection [61]. This study serves as a 
precursor for future neuroimaging studies, in which we hope to 
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(Ml)) to the rest of the models across all subjects. The Bayes 
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log evidence difiFerence >10), represent strong evidence that the 
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(TIF) 

Figure S4 Linear regression analysis of the player-specific 
ratings of the advisers and the model estimates: We aimed to 
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variable) using the estimates of advice reliability as inferred from 
the model (explanatory variable). The plot contains the player- 
specific ratings, trial-specific fii values, and the player-specific beta 
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Video SI The relationship between f and jj,^ in the response 
model. Parameter ^ determines the weight of the advice, and 

g\ % ' represents the inverse of the adviser phasic volatility 
estimate. As the inverse of ^-^ approaches °°, the estimated 
volatility of the adviser's intentions decreases, and decisions are 
more consistent with the players' beliefs. 
(MOV) 
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